Day12：OCR工具選型與操作流程

17th鐵人賽 tesseract easyocr paddleocr google colab

Yi-Ping, Fang

2025-08-12 09:38:11

823 瀏覽

分享至

　　以圖說描述一棟建築物，往往需要從平面圖、立面圖、剖面圖等不同類型的圖面中擷取關鍵資訊。例如，如果想快速了解全棟高度，就必須回到立面圖上找標註。因此，建築圖不僅僅只有圖像元素的識別，各種關鍵文字的辨識同樣是數位自動化中不可或缺的一環。

12.1. 常見OCR工具

表12.1 常見OCR工具比較表

工具名稱	優點	缺點	適用場景
Tesseract	多語言、技術資源豐富、發展歷史最悠久	中文支援較弱	簡易文書類
EasyOCR	輕量級、安裝簡單快速、多語言	中文精度一般	快速開發demo
PaddleOCR	多語言(特別適合中文)、支援複雜圖像校正、百度官方持續更新	需Python基礎、參數多，自定訓練有門檻	有複雜佈局及批次自動化需求者

由上表綜合分析，因建築圖有大量批次處理，以及特別需要中文識別，因此本系列選用PaddleOCR來實作。

12.2. PaddleOCR操作流程實作

　　首先，開啟Colab頁面：https://colab.research.google.com/ 並新增筆記本。
　　由於 PaddleOCR 及 PaddlePaddle 的版本相依性很強，且 Colab 預設安裝的 Numpy 版本通常為 2.x（而 PaddlePaddle 目前主流穩定版本普遍僅支援 Numpy < 2.0），如果在 Colab 直接使用：

!pip install paddlepaddle
!pip install paddleocr

這種安裝方式很容易與其他套件出現「相容性錯誤」或「執行時效果不佳」，因此建議依以下實測版本號安裝。
Step1：安裝 paddleocr 及必要套件。

# 1. 安裝 paddleocr 及必要套件 (Colab實測可行版本!)
!pip install --upgrade pip
!pip install paddlepaddle==3.1.0
!pip install paddleocr==2.10.0 #支援PP-OCRv4
!pip install pillow==11.1.0
!pip install opencv-python==4.12.0.88
!pip install numpy==2.2.6

#安裝完全部後，一定要「Runtime → Restart」重啟執行階段！

Step2：上傳圖片。

# 2. 上傳圖片
from google.colab import files
uploaded = files.upload()
img_paths = list(uploaded.keys())

Step3：辨識與下載輸出結果。

# 3. 辨識與下載輸出結果
from paddleocr import PaddleOCR
from google.colab import files
import zipfile
import os
import json

img_paths = list(uploaded.keys())
ocr = PaddleOCR(lang='ch', use_textline_orientation=True)

txt_files = []
json_files = []

for img_path in img_paths:
    result = ocr.ocr(img_path, det=True, rec=True, cls=True)
    # 存 txt
    txt_out = f"{img_path}_ocr.txt"
    with open(txt_out, "w", encoding="utf-8") as f:
        for line in result[0]:
            f.write(line[1][0] + '\n')
    txt_files.append(txt_out)

    # 存 json（含中心點）
    data = []
    for line in result[0]:
        box = line[0]
        text = line[1][0]
        conf = line[1][1]
        x_c = sum([p[0] for p in box]) / 4
        y_c = sum([p[1] for p in box]) / 4
        data.append({
            "text": text,
            "box": box,
            "center": [x_c, y_c],
            "confidence": conf
        })
    json_out = f"{img_path}_ocr.json"
    with open(json_out, "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=2)
    json_files.append(json_out)

# 打包所有txt+json
zip_filename = "ocr_results.zip"
with zipfile.ZipFile(zip_filename, "w") as zf:
    for file in txt_files + json_files:
        zf.write(file)
        os.remove(file)  # 清理

files.download(zip_filename)
print("全部完成！（含 txt + json）")

Step4：上傳中文字型檔。(例如標楷體預設放在C:\Windows\Fonts\kaiu.ttf )

from google.colab import files
uploaded = files.upload()

font_path = list(uploaded.keys())[0]
print(f"字型路徑：{font_path}")

Step5：可視化結果圖與下載。

# 4. 可視化結果圖與下載
from paddleocr import draw_ocr
from PIL import Image
import matplotlib.pyplot as plt
from google.colab import files
import zipfile
import os

vis_files = []

for img_path in img_paths:
    result = ocr.ocr(img_path, det=True, rec=True, cls=True)
    image = Image.open(img_path).convert('RGB')
    boxes = [line[0] for line in result[0]]
    txts = [line[1][0] for line in result[0]]
    scores = [line[1][1] for line in result[0]]
    im_show = draw_ocr(image, boxes, txts, scores, font_path=font_path)
    im_show = Image.fromarray(im_show)

    # 儲存可視化圖
    vis_out = f"{img_path}_vis.png"
    im_show.save(vis_out)
    vis_files.append(vis_out)

    # Colab 直接預覽
    plt.figure(figsize=(14, 8))
    plt.imshow(im_show)
    plt.axis('off')
    plt.title(f'OCR 結果：{img_path}')
    plt.show()

# 打包所有PNG成zip
zip_filename = "ocr_visual_results.zip"
with zipfile.ZipFile(zip_filename, "w") as zf:
    for vis_file in vis_files:
        zf.write(vis_file)
        os.remove(vis_file)  # 清理暫存

files.download(zip_filename)
print("全部可視化PNG已打包下載！")